Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation

نویسندگان

  • Shashank Gugnani
  • Rajendra Kumar Roul
  • Gennady Antoshenkov
  • Mohamed Ziauddin
  • Dirk Bahle
  • Hugh E Williams
  • Renaud Delbru
  • Stephane Campinas
  • Khaled M Hammouda
  • Wen-Chiao Hsu
  • Dik L Lee
  • Huei Chuang
  • Lipyeow Lim
  • Min Wang
  • Sriram Padmanabhan
  • Jeffrey Scott Vitter
  • Ajit Kumar Mahapatra
  • Manish Patil
  • Sharma V Thankachan
  • Rahul Shah
  • Wing-Kai Hon
چکیده

Phrase query evaluation is an important task of every search engine. Optimizing the query evaluation time for phrase queries is the biggest threat for the current search engine. Usually, phrase queries are a hassle for standard indexing techniques. This is generally because, merging the posting lists and checking the word ordering takes a lot of time. This paper proposes a new technique called Triple Indexing to index web documents which optimizes query evaluation time for phrase queries by reducing the time for merging the posting lists and checking the word ordering. In addition, a proper procedure has been put forward for document ranking using an extended vector space model. The 4 Universities dataset and Industry Sector dataset of Carnegie Mellon University has been used for experimental purpose and it has been found that using the proposed method with a modern machine, the query time for phrase queries is reduced by almost 50 percent, compared to a standard inverted index.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiway-Tree Retrieval Based on Treegrams

Large tree databases as knowledge repositories become more and more important; a prominent example are the treebanks in computational linguistics: text corpora consisting of up to five million words tagged with syntactic information. Consequently, these large amounts of structured data pose the problem of fast tree retrieval: Given a database T of labeled multiway trees and a query tree q, find...

متن کامل

تأملاتی بر نمایه‌ سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه

Purpose: This paper presents various  image indexing techniques and discusses their advantages and limitations.             Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...

متن کامل

Textual Document Indexing and Retrieval via Knowledge Sources and Data Mining

We present a knowledge-based query expansion technique to improve document retrieval effectiveness. The general concept terms in a query are substituted by a set of specific concept terms used in the corpus that co-occur with the key query concept. Since the expanded query matches with the document index terms much better, experimental results reveal that such query expansion produces better re...

متن کامل

Lucene for n-grams using the CLUEWeb Collection

The ARSC team made modifications to the Apache Lucene engine to accommodate " go words, " taken from the Google Gigaword vocabulary of n‐grams. Indexing the Category " B " subset of the ClueWeb collection was accomplished by a divide and conquer method, working across the separate ClueWeb subsets for 1, 2 and 3‐grams. Phrase searching—or imposing an order on query terms—has traditionally been a...

متن کامل

A role-free approach to indexing large RDF data sets in secondary memory for efficient SPARQL evaluation

Massive RDF data sets are becoming commonplace. RDF data is typically generated in social semantic domains (such as personal information management [2, 11, 13]) wherein a fixed schema is often not available a priori. We propose a simple Three-way Triple Tree (TripleT) secondary-memory indexing technique to facilitate efficient SPARQL query evaluation on such data sets. The novelty of TripleT is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014